Optimal Cross-Validation Split Ratio: Experimental Investigation
نویسنده
چکیده
Cross-validation is a widespread method for assessing the generalisation ability of a model in order to tune a regularisation parameter or other hyper-parameters of a learning process. The use of cross-validation requires to set yet an additional parameter, the split ratio. Few texts have investigated theoretically the asymptotic setting of this ratio, and no consensus has emerged. In this contribution, we investigate the sensitivity and optimal setting of the split ratio on a particular model, a non-parametric kernel estimator with adaptive metric. 1 Cross-validation Most eecient learning procedures require the setting of an extra learning parameter , or \hyper-parameter". Neural networks typically use a regularisation parameter weighting a weight decay 1], or the extent of pruning 2]. Estimating the \optimal" hyper-parameter is the topic of active current research in the statistical learning community 3]. Let us consider a typical learning problem: modelling an input-output relationship based on some empirical data
منابع مشابه
On Optimal Data Split for Generalization Estimation and Model Selection
Modeling with flexible models, such as neural networks, requires careful control of the model complexity and generalization ability of the resulting model. Whereas general asymptotic estimators of generalization ability have been developed over recent years (e.g., [9]), it is widely acknowledged that in most modeling scenarios there isn't sufficient data available to reliably use these estimato...
متن کاملDevelopment of Flow within a Diffusing C-Duct –Experimental Investigation and Numerical Validation
Experimental investigation of flow development within a rectangular 90o curved diffusing C-duct of low aspect ratio and area ratio of 2 was carried out and the threedimensional computational results are then compared with the experimental results for numerical validation. All measurements were made in a turbulent flow regime (Re = 2.35x10 5 ), based on the duct inlet hydraulic diameter (dh = 0....
متن کاملA toolkit for cross-validation: The R package cvTools
The idea of cross-validation is simple and easy to implement: split the data into several blocks, leave out one block for model estimation, and predict the values of the left-out block. Those predictions are then used to compute a certain prediction loss function. Even though the basic procedure is simple, some additional programming effort is necessary for more complex procedures such as repea...
متن کاملParallel Sampling of HDPs using Sub-Cluster Splits
We develop a sampling technique for Hierarchical Dirichlet process models. The parallel algorithm builds upon [1] by proposing large split and merge moves based on learned sub-clusters. The additional global split and merge moves drastically improve convergence in the experimental results. Furthermore, we discover that cross-validation techniques do not adequately determine convergence, and tha...
متن کاملDetermining optimal value of the shape parameter $c$ in RBF for unequal distances topographical points by Cross-Validation algorithm
Several radial basis function based methods contain a free shape parameter which has a crucial role in the accuracy of the methods. Performance evaluation of this parameter in different functions with various data has always been a topic of study. In the present paper, we consider studying the methods which determine an optimal value for the shape parameter in interpolations of radial basis ...
متن کامل